Scattering Representation of Modulated Sounds

نویسندگان

  • Joakim Andén
  • Stéphane Mallat
چکیده

Mel-frequency spectral coefficients (MFSCs), calculated by averaging the spectrogram along a mel-frequency scale, are used in many audio classification tasks. Their efficiency can be partly explained by their stability to deformation in a Euclidean norm. However, averaging the spectrogram loses high-frequency information. This loss is reduced by keeping the window size small, around 20 ms, which in turn prevents MFSCs from capturing largescale structures. Scattering coefficients recover part of this lost information using a cascade of wavelet decompositions and modulus operators, enabling larger window sizes. This representation is sufficiently rich to capture note attacks, amplitude and frequency modulation, as well as chord structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Representation of Non-Linguistic Sounds in Persian and English Subtitles for the Deaf and Hard-of-Hearing: A Comparative Study

Subtitling for the deaf and hard-of-hearing (SDH) is an area which deserves a special attention as it ena- bles these people to access to the part of the ‘world’ intended for hearing people, including the world of ‘motion pictures’, and particularly movie sounds. Compared to linguistic sounds, non-linguistic sounds have received little attention in the field of translation, although they are in...

متن کامل

Transformée en scattering sur la spirale temps-chroma-octave

We introduce a scattering representation for the analysis and classification of sounds. It is locally translation-invariant, stable to deformations in time and frequency, and has the ability to capture harmonic structures. The scattering representation can be interpreted as a convolutional neural network which cascades a wavelet transform in time and along a harmonic spiral. We study its applic...

متن کامل

محاسبه سطح مقطع پراکندگی تفکیک پروتون- دوترون در انرژی‌های میانی

In this paper, we reformulate three-nucleon breakup scattering in leading order approximation by considering spin-isospin degrees of freedom. At first, considering the inhomogeneous part of Faddeev equation, which is a valid approximation in high and intermediate energles, we present the Faddeev equation as a function of vector Jacobi momenta and spin and isospin quantum numbers. In this new fo...

متن کامل

Separate neural systems for processing action- or non-action-related sounds.

The finding of a multisensory representation of actions in a premotor area of the monkey brain suggests that similar multimodal action-matching mechanisms may also be present in humans. Based on the existence of an audiovisual mirror system, we investigated whether sounds referring to actions that can be performed by the perceiver underlie different processing in the human brain. We recorded mu...

متن کامل

Wavelet Scattering on the Pitch Spiral

We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012